Multi-Objective Reinforcement Learning

نویسندگان

  • A. Nowé
  • K. Van Moffaert
  • M. M. Drugan
چکیده

In multi-objective reinforcement learning (MORL) the agent is provided with multiple feedback signals when performing an action. These signals can be independent, complementary or conflicting. Hence, MORL is the process of learning policies that optimize multiple criteria simultaneously. In this abstract, we briefly describe our extensions to single-objective multi-armed bandits and reinforcement learning (RL) algorithms to make them applicable in multi-objective environments. Bellow, we first summarize the main parts of our multi-objective multi-armed bandits (MOMAB) [2] framework that incorporates techniques from multi-objective optimization. A first straightforward approach is to use a scalarization function which transforms the multi-objective environment into a single-objective environment, and therefore they can be implemented into the standard MAB framework. Since single-objective environments, in general, results in a single optimum, we need to vary the scalarization functions to generate a variety of elements belonging to the Pareto optimal set. The efficiency of these MOMAB algorithms depends very much on the scalarization type. The linear scalarization is the most popular scalarization function due to its simplicity. Moreover, a known problem with linear scalarization is its incapacity to potentially find all the points in a non-convex Pareto set. The Chebyshev scalarization has the advantage that it can find all the points in a non-convex Pareto set at the cost of introducing another parameter i.e. the utopian point, that needs to be optimized. To measure the performance of a single scalarized MOMAB we introduce the scalarized regret that is defined by the difference between the maximum scalarized value and the scalarization of an suboptimal arm. The upper bound of a multiobjective UCB1 that randomly alternates between a number of scalarized UCB1 is then the sum of upper bounds for each single UCB1. While this definition of regret seems natural, it only partially serves our goal because it gathers a collection of independent regrets instead of minimizing the regret of a multi-objective strategy

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...

متن کامل

Multi-objective optimization perspectives on reinforcement learning algorithms using reward vectors

Reinforcement learning is a machine learning area that studies which actions an agent can take in order to optimize a cumulative reward function. Recently, a new class of reinforcement learning algorithms with multiple, possibly conflicting, reward functions was proposed. We call this class of algorithms the multi-objective reinforcement learning (MORL) paradigm. We give an overview on multi-ob...

متن کامل

Multi-Objectivization in Reinforcement Learning

Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivizati...

متن کامل

Multi-Objective Deep Reinforcement Learning

We propose Deep Optimistic Linear Support Learning (DOL) to solve highdimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the fir...

متن کامل

A Multi-Objective Deep Reinforcement Learning Framework

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on a deep sea treasure environment indicate that the proposed approach is able to converge to the optimal Pareto solutions. ...

متن کامل

Balancing Learning and Engagement in Game-Based Learning Environments with Multi-objective Reinforcement Learning

Game-based learning environments create rich learning experiences that are both effective and engaging. Recent years have seen growing interest in data-driven techniques for tutorial planning, which dynamically personalize learning experiences by providing hints, feedback, and problem scenarios at runtime. In game-based learning environments, tutorial planners are designed to adapt gameplay eve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013